AITopics | similarity algorithm

Collaborating Authors

similarity algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving ICD-based semantic similarity by accounting for varying degrees of comorbidity

Schneider, Jan Janosch, Adler, Marius, Ammer-Herrmenau, Christoph, König, Alexander Otto, Sax, Ulrich, Hügel, Jonas

arXiv.org Artificial IntelligenceAug-14-2023

Finding similar patients is a common objective in precision medicine, facilitating treatment outcome assessment and clinical decision support. Choosing widely-available patient features and appropriate mathematical methods for similarity calculations is crucial. International Statistical Classification of Diseases and Related Health Problems (ICD) codes are used worldwide to encode diseases and are available for nearly all patients. Aggregated as sets consisting of primary and secondary diagnoses they can display a degree of comorbidity and reveal comorbidity patterns. It is possible to compute the similarity of patients based on their ICD codes by using semantic similarity algorithms. These algorithms have been traditionally evaluated using a single-term expert rated data set. However, real-word patient data often display varying degrees of documented comorbidities that might impair algorithm performance. To account for this, we present a scale term that considers documented comorbidity-variance. In this work, we compared the performance of 80 combinations of established algorithms in terms of semantic similarity based on ICD-code sets. The sets have been extracted from patients with a C25.X (pancreatic cancer) primary diagnosis and provide a variety of different combinations of ICD-codes. Using our scale term we yielded the best results with a combination of level-based information content, Leacock & Chodorow concept similarity and bipartite graph matching for the set similarities reaching a correlation of 0.75 with our expert's ground truth. Our results highlight the importance of accounting for comorbidity variance while demonstrating how well current semantic similarity algorithms perform.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2308.07359

Country:

Europe > Germany > Lower Saxony > Gottingen (0.16)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Ohio > Franklin County > Columbus (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Comparison of Document Similarity Algorithms

Gahman, Nicholas, Elangovan, Vinayak

arXiv.org Artificial IntelligenceApr-3-2023

Document similarity is an important part of Natural Language Processing and is most commonly used for plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity algorithm could have a major positive impact on the field of Natural Language Processing. This report sets out to examine the numerous document similarity algorithms, and determine which ones are the most useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based algorithms. The most effective algorithms in each category are also compared in our work using a series of benchmark datasets and evaluations that test every possible area that each algorithm could be used in. NTRODUCTION Document similarity analysis is a Natural Language Processing (NLP) task where two or more documents are analyzed to recognize the similarities between these documents. Document similarity is heavily used in text summarization, recommender systems, plagiarism-detection as well as in search engines. Identifying the level of similarity or dissimilarity between two or more documents based on their content is the main objective of document similarity analysis.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.0133

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Pennsylvania (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Artificial Intelligence at eBay - Two Current Use-Cases

#artificialintelligenceJun-28-2022, 18:30:37 GMT

Daniel Faggella is Head of Research at Emerj. Called upon by the United Nations, World Bank, INTERPOL, and leading enterprises, Daniel is a globally sought-after expert on the competitive strategy implications of AI for business and government leaders. The company that would become eBay was founded as a sole proprietorship under the name AuctionWeb in September 1995 by Pierre Omidyar. The company changed its name to eBay in September 1997. Today, eBay is a global e-commerce leader in more than 190 markets throughout the world.

algorithm, ebay, similarity algorithm, (13 more...)

#artificialintelligence

Country: North America > United States > California (0.05)

Industry:

Information Technology > Services (1.00)
Consumer Products & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.35)

Add feedback

User-friendly Comparison of Similarity Algorithms on Wikidata

Ilievski, Filip, Szekely, Pedro, Satyukov, Gleb, Singh, Amandeep

arXiv.org Artificial IntelligenceAug-11-2021

While the similarity between two concept words has been evaluated and studied for decades, much less attention has been devoted to algorithms that can compute the similarity of nodes in very large knowledge graphs, like Wikidata. To facilitate investigations and head-to-head comparisons of similarity algorithms on Wikidata, we present a user-friendly interface that allows flexible computation of similarity between Qnodes in Wikidata. At present, the similarity interface supports four algorithms, based on: graph embeddings (TransE, ComplEx), text embeddings (BERT), and class-based similarity. We demonstrate the behavior of the algorithms on representative examples about semantically similar, related, and entirely unrelated entity pairs. To support anticipated applications that require efficient similarity computations, like entity linking and recommendation, we also provide a REST API that can compute most similar neighbors for any Qnode in Wikidata.

arXiv.org Artificial Intelligence

2108.0541

Country:

Asia > Russia (0.15)
North America > United States > California (0.14)
Europe > Norway (0.06)
Europe > Russia (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.50)

Add feedback

Supervised machine learning techniques for data matching based on similarity metrics

Verschuuren, Pim, Palazzo, Serena, Powell, Tom, Sutton, Steve, Pilgrim, Alfred, Giannelli, Michele Faucci

arXiv.org Machine LearningJul-8-2020

Businesses, governmental bodies and NGO's have an ever-increasing amount of data at their disposal from which they try to extract valuable information. Often, this needs to be done not only accurately but also within a short time frame. Clean and consistent data is therefore crucial. Data matching is the field that tries to identify instances in data that refer to the same real-world entity. In this study, machine learning techniques are combined with string similarity functions to the field of data matching. A dataset of invoices from a variety of businesses and organizations was preprocessed with a grouping scheme to reduce pair dimensionality and a set of similarity functions was used to quantify similarity between invoice pairs. The resulting invoice pair dataset was then used to train and validate a neural network and a boosted decision tree. The performance was compared with a solution from FISCAL Technologies as a benchmark against currently available deduplication solutions. Both the neural network and boosted decision tree showed equal to better performance.

artificial intelligence, invoice, machine learning, (19 more...)

arXiv.org Machine Learning

2007.04001

Country:

North America > United States > Florida > Hillsborough County > Tampa (0.04)
Europe > United Kingdom > England > Berkshire > Reading (0.04)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback